Extending the Four Russian Algorithm to Compute the Edit Script in Linear Space

نویسندگان

  • Vamsi Kundeti
  • Sanguthevar Rajasekaran
چکیده

Computing the Edit Distance between two strings is one of the most fundamental problems in computer science. Algorithms based on edit distance are used extensively in the alignment of biological sequences. Any improvement either in time or space in solving this problem will be highly desirable. The standard dynamic programming based algorithm to compute the edit distance of two strings S1 = [a1, a2, a2 . . . an] and S2 = [b1, b2, b3 . . . bn] takes O(n2) time. This algorithm computes the value of the edit distance using O(n2) space. Within these resource bounds, it also computes the actual edit script (i.e., a sequence of Inserts, deletes, and changes that transfroms S1 to S2). Often the edit script is more important for several problems (such as sequence alignment) than the value of the edit distance. The first major improvement in the asymptotic runtime for computing the value of the edit distance was acheived in [KF70]. This algorithm is widely known as the Four Russian Algorithm and it improves the running time by a factor of O(log n) (with a run time of O(n2/ log n)) to compute just the value of the edit distance. It does not address the problem of computing the actual edit script, which is of wider interest rather than just the value. Hirschberg [Hir77] has given an algorithm that computes the actual script in O(n2) time. In paper [RA03] linear space parallel algorithms for the sequence alignment problem were given, however they assume that O(n2) is the optimal asymptotic complexity of the sequential algorithm. In this paper we present algorithms that compute both the edit script and value in O( n 2 logn) time using O(n) space.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science at the University of Connecticut 2008

String editing problem is one of the most fundamental problems in computer science and is used extensively in Bio-Informatics. In this thesis we present two new results relevant to the string editing problem. Firstly we show how we can simultaneously compute both the edit distance and edit script between two strings of length n in time O(n2/log(n)) and O(n) space. The Four Russian algorithm com...

متن کامل

An Algorithm to Compute the Complexity of a Static Production Planning (RESEARCH NOTE)

Complexity is one of the most important issues of any production planning. The increase in complexity of production planning can cause inconsistency between a production plan and an actual outcome. The complexity generally can be divided in two categories, the static complexity and the dynamic complexity, which can be computed using the ant ropy formula. The formula considers the probability of...

متن کامل

Noisy Subsequence Recognition Using Constrained String Editing Involving Arbitrary Operations*

We consider a problem which can greatly enhance the areas of cursive script recognition and the recognition of printed character sequences. This problem involves recognizing words/strings by processing their noisy subsequences. Let X* be any unknown word from a finite dictionary H. Let U be any arbitrary subsequence of X*. We study the problem of estimating X* by processing Y, a noisy version o...

متن کامل

An XML Document Transformation Algorithm Inferred from an Edit Script between DTDs

Finding an appropriate data transformation between two schemas has been an important problem. In this paper, assuming that an edit script between original and updated DTDs is available, we consider inferring a transformation algorithm, which transforms each document valid against the original DTD into a document valid against the updated DTD, from the original DTD and the edit script. We first ...

متن کامل

Optimization of Mixed-Integer Non-Linear Electricity Generation Expansion Planning Problem Based on Newly Improved Gravitational Search Algorithm

Electricity demand is forecasted to double in 2035, and it is vital to address the economicsof electrical energy generation for planning purposes. This study aims to examine the applicability ofGravitational Search Algorithm (GSA) and the newly improved GSA (IGSA) for optimization of themixed-integer non-linear electricity generation expansion planning (GEP) problem. The performanceindex of GEP...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008